12 research outputs found
Recommended from our members
Hierarchical policy design for sample-efficient learning of robot table tennis through self-play
Training robots with physical bodies requires developing new methods and action representations that allow the learning agents to explore the space of policies efficiently. This work studies sample-efficient learning of complex policies in the context of robot table tennis. It incorporates learning into a hierarchical control framework using a model-free strategy layer (which requires complex reasoning about opponents that is difficult to do in a model-based way), model-based prediction of external objects (which are difficult to control directly with analytic control methods, but governed by learnable and relatively simple laws of physics), and analytic controllers for the robot itself. Human demonstrations are used to train dynamics models, which together with the analytic controller allow any robot that is physically capable to play table tennis without training episodes. Using only about 7000 demonstrated trajectories, a striking policy can hit ball targets with about 20 cm error. Self-play is used to train cooperative and adversarial strategies on top of model-based striking skills trained from human demonstrations. After only about 24000 strikes in self-play the agent learns to best exploit the human dynamics models for longer cooperative games. Further experiments demonstrate that more flexible variants of the policy can discover new strikes not demonstrated by humans and achieve higher performance at the expense of lower sample-efficiency. Experiments are carried out in a virtual reality environment using sensory observations that are obtainable in the real world. The high sample-efficiency demonstrated in the evaluations show that the proposed method is suitable for learning directly on physical robots without transfer of models or policies from simulation.Computer Science
Geometry-Based Next Frame Prediction from Monocular Video
We consider the problem of next frame prediction from video input. A
recurrent convolutional neural network is trained to predict depth from
monocular video input, which, along with the current video image and the camera
trajectory, can then be used to compute the next frame. Unlike prior next-frame
prediction approaches, we take advantage of the scene geometry and use the
predicted depth for generating the next frame prediction. Our approach can
produce rich next frame predictions which include depth information attached to
each pixel. Another novel aspect of our approach is that it predicts depth from
a sequence of images (e.g. in a video), rather than from a single still image.
We evaluate the proposed approach on the KITTI dataset, a standard dataset for
benchmarking tasks relevant to autonomous driving. The proposed method produces
results which are visually and numerically superior to existing methods that
directly predict the next frame. We show that the accuracy of depth prediction
improves as more prior frames are considered.Comment: To appear in 2017 IEEE Intelligent Vehicles Symposiu
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Learning to predict scene depth from RGB inputs is a challenging task both
for indoor and outdoor robot navigation. In this work we address unsupervised
learning of scene depth and robot ego-motion where supervision is provided by
monocular videos, as cameras are the cheapest, least restrictive and most
ubiquitous sensor for robotics.
Previous work in unsupervised image-to-depth learning has established strong
baselines in the domain. We propose a novel approach which produces higher
quality results, is able to model moving objects and is shown to transfer
across data domains, e.g. from outdoors to indoor scenes. The main idea is to
introduce geometric structure in the learning process, by modeling the scene
and the individual objects; camera ego-motion and object motions are learned
from monocular videos as input. Furthermore an online refinement method is
introduced to adapt learning on the fly to unknown domains.
The proposed approach outperforms all state-of-the-art approaches, including
those that handle motion e.g. through learned flow. Our results are comparable
in quality to the ones which used stereo as supervision and significantly
improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across
environments, by transferring models trained on data collected for robot
navigation in urban scenes to indoor navigation settings. The code associated
with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19
A phase III, randomized, two-armed, double-blind, parallel, active controlled, and non-inferiority clinical trial to compare efficacy and safety of biosimilar adalimumab (CinnoRA (R)) to the reference product (Humira (R)) in patients with active rheumatoid arthritis
Background: This study aimed to compare efficacy and safety of test-adalimumab (CinnoRA (R), CinnaGen, Iran) to the innovator product (Humira (R), AbbVie, USA) in adult patients with active rheumatoid arthritis (RA). Methods: In this randomized, double-blind, active-controlled, non-inferiority trial, a total of 136 patients with active RA were randomized to receive 40 mg subcutaneous injections of either CinnoRA (R) or Humira (R) every other week, while receiving methotrexate (15 mg/week), folic acid (1 mg/day), and prednisolone (7.5 mg/day) over a period of 24 weeks. Physical examinations, vital sign evaluations, and laboratory tests were conducted in patients at baseline and at 12-week and 24-week visits. The primary endpoint in this study was the proportion of patients achieving moderate and good disease activity score in 28 joints-erythrocyte sedimentation rate (DAS28-ESR)-based European League Against Rheumatism (EULAR) response. The secondary endpoints were the proportion of patients achieving American College of Rheumatology (ACR) criteria for 20% (ACR20), 50% (ACR50), and 70% (ACR70) responses along with the disability index of health assessment questionnaire (HAQ), and safety. Results: Patients who were randomized to CinnoRA (R) or Humira (R) arms had comparable demographic information, laboratory results, and disease characteristics at baseline. The proportion of patients achieving good and moderate EULAR responses in the CinnoRA (R) group was non-inferior to the Humira (R) group at 12 and 24 weeks based on both intention-to-treat (ITT) and per-protocol (PP) populations (all p values >0.05). No significant difference was noted in the proportion of patients attaining ACR20, ACR50, and ACR70 responses in the CinnoRA (R) and Humira (R) groups (all p values >0.05). Further, the difference in HAQ scores and safety outcome measures between treatment arms was not statistically significant. Conclusion: CinnoRA (R) was shown to be non-inferior to Humira (R) in terms of efficacy at week 24 with a comparable safety profile to the reference product
Robotic Table Tennis: A Case Study into a High Speed Learning System
We present a deep-dive into a real-world robotic learning system that, in
previous work, was shown to be capable of hundreds of table tennis rallies with
a human and has the ability to precisely return the ball to desired targets.
This system puts together a highly optimized perception subsystem, a high-speed
low-latency robot controller, a simulation paradigm that can prevent damage in
the real world and also train policies for zero-shot transfer, and automated
real world environment resets that enable autonomous training and evaluation on
physical robots. We complement a complete system description, including
numerous design decisions that are typically not widely disseminated, with a
collection of studies that clarify the importance of mitigating various sources
of latency, accounting for training and deployment distribution shifts,
robustness of the perception system, sensitivity to policy hyper-parameters,
and choice of action space. A video demonstrating the components of the system
and details of experimental results can be found at
https://youtu.be/uFcnWjB42I0.Comment: Published and presented at Robotics: Science and Systems (RSS2023
Revisiting Multi-Scale Feature Fusion for Semantic Segmentation
It is commonly believed that high internal resolution combined with expensive
operations (e.g. atrous convolutions) are necessary for accurate semantic
segmentation, resulting in slow speed and large memory usage. In this paper, we
question this belief and demonstrate that neither high internal resolution nor
atrous convolutions are necessary. Our intuition is that although segmentation
is a dense per-pixel prediction task, the semantics of each pixel often depend
on both nearby neighbors and far-away context; therefore, a more powerful
multi-scale feature fusion network plays a critical role. Following this
intuition, we revisit the conventional multi-scale feature space (typically
capped at P5) and extend it to a much richer space, up to P9, where the
smallest features are only 1/512 of the input size and thus have very large
receptive fields. To process such a rich feature space, we leverage the recent
BiFPN to fuse the multi-scale features. Based on these insights, we develop a
simplified segmentation model, named ESeg, which has neither high internal
resolution nor expensive atrous convolutions. Perhaps surprisingly, our simple
method can achieve better accuracy with faster speed than prior art across
multiple datasets. In real-time settings, ESeg-Lite-S achieves 76.0% mIoU on
CityScapes [12] at 189 FPS, outperforming FasterSeg [9] (73.1% mIoU at 170
FPS). Our ESeg-Lite-L runs at 79 FPS and achieves 80.1% mIoU, largely closing
the gap between real-time and high-performance segmentation models